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Abstract. The 2-star model is the simplest exponential random graph model that displays 
complex behavior, such as degeneracy and phase transition. Despite its importance, this model 
has been solved only in the regime of dense connectivity. In this work we solve the model in the 
finite connectivity regime, far more prevalent in real world networks. We show that the model 
undergoes a condensation transition from a liquid to a condensate phase along the critical line 
corresponding, in the ensemble parameters space, to the Erdos-Renyi graphs. In the fluid phase 
the model can produce graphs with a narrow degree statistics, ranging from regular to Erdos-Renyi 
graphs, while in the condensed phase, the “excess” degree heterogeneity condenses on a single site 
with degree ^ VA. This shows the unsuitability of the two-star model, in its standard definition, 
to produce arbitrary finitely connected graphs with degree heterogeneity higher than Erdos-Renyi 
graphs and suggests that non-pathological variants of this model may be attained by softening 
the global constraint on the two-stars, while keeping the number of links hardly constrained. 


1. Introduction 

Exponential random graph models (ERGM) are ensembles of random graphs where each graph 
conhguration c appears with a probability p(c) oc given by the Gibbs-Boltzmann 

distribution, where H (c) is the graph Hamiltonian, enclosing several properties of the networks in 
the ensemble. First introduced in the 1980s by Holland and Leinhardt [ 22 ], and further developed 
by Frank and Strauss [T9| and in several later studies [HEIIEIEIEZ], ERGM soon became popular 
in social network analysis. Gomputational tools to analyze and simulate networks based on ERGM 
are largely available on the web, as the ERGM and SIENA packages, and several paradigmatic 
models of random graphs can be written in the exponential form, for suitable choices of the 
graph Hamiltonian, including the Erdos-Renyii (ER) graph ensemble [20] and the ensemble of 
graphs with soft-constrained degree sequence [8], HJ [16] . However, ERGM have known drawbacks 
which limit their practical use as proxies or null models for real networks. In particular, they 
may display degeneracy behaviour and may fail to produce graphs with properties within certain 
ranges, which are nevertheless observed in nature. 

A useful model to understand degeneracy behavior and limitations of ERGM, is the two- 
star model, which is simple enough to be amenable of exact solutions, while exhibiting the 
interesting bahavior of more complex ERGM. A well-known solution for the two-star model has 
been developed, in the limit of large system size A, by Park and Newmann 1121 . within a mean- 
field approach and expansions around it. However, this approach requires that the connectivity 
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of the graph is 0{N), a condition which is hardly met by real world networks, like social and 
biological networks, where the connectivity is 0{N^). In this work we solve the two-star model 
exactly, in the finite connectivity regime, for large system size, by nsing path integrals. We check 
onr calculations against Monte-Carlo simulations and compare our results with those predicted by 
mean-held theories. Results show that the 2-star model undergoes a condensation transition from 
a liquid to a condensate state, along the critical line corresponding, in the ensemble paramters 
space, to the Erdos-Renyi graphs. In the liquid phase the model can only produce graphs within 
a narrow range of degree statistics, namely between regular and Erdos-Renyi (ER), whereas in 
the condensed phase a condensate of size ~ y/N emerges, residing on a single site. 

The paper is structured as follows: in Sec. |^we dehne the model and review the mean-held 
solution, in Sec. we solve the two-star model in the hnite connectivity regime and compare 
our results with the mean-held predictions and with Monte-Carlo simulations. In Sec. we 
show that the phase transition displayed by the system in the hnite connectivity regime is a 
condensation, related to large huctuations in the sums of random variables. Finally, we summarise 
our conclusions in Sec. HI 

2. The model and the mean-field solution 

The 2-star model is a prototypical example of ERGM. In this section, we give a brief review of 
ERGM, the 2-star model and its mean-held solution. 

2.1. Background: the ERGM 

For simple and undirected graphs of N nodes, ERGM are graph ensembles where each graph 
is dehned by an adjacency matrix c, with elements G {0,1} V i, j = 1,..., N, ca = 0 \/ i, 
Cij = Cji i < j. In ERGM, each graph c appears with a probability given by the Gibbs-Botzmann 
distribution 

p(c) = (1) 

with Hamiltonian H{c) and partition function Z = The Hamiltonian is given by 

H(c) = - I; A„fi,.(c) (2) 

where fi(c) = (r2i(c),..., r2^c(c)) is a set of observables of the graph c for which one has statistical 
estimates f2 = (Hi, ..., so that 

(H^(c))=H^ V/i = l...,ir (3) 

with (H^(c)) = I]cP(c)H^(c) and A = (Ai,...,Ai^) are ensembles parameters also called 
“conjugated” observables, which have to be calculated from the equations for the constraints 
([^. The distribution ([^ with Hamiltonian ([^ maximises the Shannon entropy 

‘5(H) = -^p(c) Inp(c) 


( 4 ) 
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subject to the constraints ([^ and the normalization J2cPi^) = 1- Hence, ERGM are maximum 


entropy ensembles conditioned on the imposed constraints. 

For all but the simplest ERGM, exact solutions for the ensemble parameters are not available, 
so one has normally to take recourse to numerical methods. The latter have been the subject 
of intense investigation over the last few decades and range from pseudo-likelihood estimation 
[ISl Cl El EU IMl ES] to Markov Ghain Monte Garlo Maximum Likelihood techniques [28 l ITT l l26 l l23] . 
to Bayesian inference mmm- 

Once the values of the parameters A are available, one can use the resulting probability 
distribution p(c) to estimate the value of observables for which no estimate is available. We note 
that expectation values for the primary network observables are given by 



( 5 ) 


where F{X) = — InZ(A) is the free-energy, and their fluctuations are given by the second 
derivative of the free-energy = d‘^F/d\^^ which, in equilibrium, equates the so- 

called susceptibility, dehned as /d\^, and measuring the deviation in (G^) when a 

change in the external variable is applied. These relations suggest that in general we are able 
to calculate the ensemble parameters from the equations for the constraints, if we can calculate 
the free energy of the ERGM. This is straightforward only for Hamiltonians which are linear in 
the adjacency matrix, where the graph distribution factorizes over the links p{c) = p{cij). 
For non-linear Hamiltonians, like the two-star model and Strauss model [7], solutions have been 
found within mean-held approximations and expansions about it [laiii]. 

2.2. The 2-star model 

In the two-star model, one chooses, as graph observables, the number of links 



( 6 ) 


and the number of two-stars (i.e. paths of length two) 


o ^ CijCjk 


( 7 ) 


and assumes that the expectations L = (T(c)) and S = (S'(c)) are known. This leads to the 
non-linear network Hamiltonian 


H{c) — — 6iL{c) — 6 * 2 S'(c) — — ^ Cij{6i + 6*2 X] 


i<j 




Cjj {di — 62) + ^2 X] G'fc 


i<j 


k 



( 8 ) 
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where we set a = {9i — 02)/2 and (3 = 62 / 2 . Alternatively, we can define the local degrees of 
graph c as ki{c) = Y,j Cij V i, and write the observables (|^ and ([^ as 


i 


so that the Hamiltonian reads 


H{c) = -aY, ki{c) -(3Y k%c) 


(9) 


( 10 ) 


At this point it is usefnl to define the average connectivity fc(c) = N~^ Yhi ki{c) and the expected 
average connectivity in the ensemble (k) = {k{c)), with (■) denoting the average over the ensemble 
probability p{c). In the two-star model, we have L = N{k)/2 and S = N{{k^) — {k))/2, hence 
we constrain {k) and 

The two-star model has so far been solved for dense graphs, with average connectivity 
(k) = 0{N), for which mean-field approaches are exact in the thermodynamic limit. The 


mean-field solntion (see Section 2.3) shows that for large N, and /3N = 2J, the two-star model 
nndergoes, for any J > 1, a first-order phase transition, between a phase of low connectivity 
and one of high connectivity, separated by the critical line J = —a in the space of ensemble 
parameters. The critical line terminates at the critical point J = 1, where a second order phase 
transition takes place from the symmetry-broken state with two phases, the one with high and the 
one with low connectivity respectively, to a symmetric state mM- In particular, for J > 1 the 
link density is discontinuous meaning that the model fails to produce intermediate connectivities 
by suitably tuning the ensemble parameters. Along the critical line J = —a (and for J > 1) 
one has degeneracy, meaning that for the same ensemble parameters the model produces either 
a sparse or a dense graph. 

However, it is not clear a priori how well the mean-field scenario applies to the finite 
connectivity regime, where Gaussian fluctuations (around the mean-field solution) become 
dominant rather than a small perturbation around the leading order statistics. In addition, 
we are interested in establishing whether in the phase at low connectivity the model can produce 
arbitrary densities of stars for any given (finite) connectivity, by choosing suitably the ensemble 
parameters. More in general, we aim to establish whether the 2-star model may serve as a plausible 
null-model for finitely connected networks, which are far more prevalent than dense networks in 
the real world. We answer these questions precisely in Section by solving the two-star model 
exactly, in the limit N —)■ 00 , for finite connectivity (k) = 0{N^). 


2.3. The mean-field approach 

In this section we briefly illustrate the mean-field approach, that is exact, in the thermodynamic 
limit, for dense graphs 1131. but it is expected to give inaccurate results for finite connectivity. By 
analogy with spin models, we can regard the expression in the brackets of (|^ as the local field 
acting on the link Cij. The mean-field approximation replaces the local fields with their ensemble 
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averages, i.e. Cjk —>■ (cjk)- Doing so, all edges in the model become equivalent and the average 
probability to observe a link is the same for all links p = (cjk) V j, k. Hence, the Hamiltonian 
becomes 




'V 


( 11 ) 


l<] 


where 


A = 2a + 2I3{N - l)p ~ 2a + 2/3Np (12) 

is now a function of the unknown probability p and the last approximation holds for ^ 1. The 
Hamiltonian ( |IT| leads to the partition function 

Z = ^ = n E = 11(1 + e") = (1 + 

C i<j Cij i<j 

which immediately gives the free energy 

N{N -1) 


(13) 


F = - In Z = 


ln(l + e^) 


that can be used to write the equation for the constraint 
^ _ dF _ N{N - 1) 

“ ~ 2 1 + e^' 

If links are all drawn randomly and independently with probability p, we have 

iV(iV-l) 

L = p 


hence we get from (15) 


p = 


1 + 


= ^ f 1 + tanh ^ 


(14) 


(15) 


(16) 


(17) 


where in the last equality we used the identity e®/2cosha: = (1 + tanha;)/2. Now, however, A is 
a function of p, so the above gives a self-consistency equation for p 


p = - [tanh(/3N'p -|- a) -|- 1] 

This equation is identical to the one found by Park and Newman 
in the dense regime, i.e. for p = 0{1), and by setting /3N = 2J 

p = - tanh (2 Jp -|- a) -|- 1 


(18) 

by solving the model exactly 

(19) 


As noted in [T3], a unique solution to (19) exists only for J < 1. For J > 1 and a sufficiently close 
to — J there are three solutions, with only the outer two being stable, leading to a degeneracy in 
the solution. Expansions around the mean-held solution and perturbation theories [ni[ii] give 
for the hrst two moments of the degree distribution 

4Jp(l — p)(l — 2p) 


{k) = Np -|- 


= Ary + 


[1 — 8 Jp(l — p) [1 — 4 Jp(l — p)] 
A'p(l — p)(l — 8Jp^) 


( 20 ) 


[1 — 8 Jp(l — p) [1 — 4 Jp(l — p)] 


( 21 ) 














The two-star model: exact solution in the sparse regime and condensation transition 


6 


where the second terms on the RHS, originate from the Gaussian fluctuations about the mean-held 
solution. These are subleading for large N, in the dense regime where p = 0{1) and J = 0{1). 


However, in the hnite connectivity regime where p = 0{N~^) and f3 = 0{1) (see Appendix A), 
Gaussian huctuations are no longer small huctuations about the leading order statistics. Upon 
setting p = c/N and sending N ^ oo at constant (3 and c, we get 


{k) =c 


(P) = c 


1 + 


c 


(1 


2/?c)(l 

1 


/3 c) 


( 22 ) 


(23) 


(l-2/3c)(l-/3c)_ 

showing that the mean-held approximation becomes inexact in the hnite connectivity regime. For 
a full discussion of mean-held predictions in the hnite connectivity regime see Appendix A[ 


2.4- Upper and lower bounds on the total number of stars in the finite connectivity regime 

Before solving the model in the hnite connectivity regime, it is useful to derive expressions for the 
upper and lower physical bounds on the number of stars that the model can exhibit at a given 
hnite connectivity. The expected number of 2-stars is S' = N{{k‘^) — {k))/2. For a hxed number of 
edges L, the total number of stars is minimised by minimising the second moments of the degree 
distribution while keeping the hrst moment hxed, i.e. by making the degree distribution regular, 
so that = {k)^ attains its physical minimum. This corresponds to the minimum star density 


^ = Imk) - 1 ) 


(24) 


The total number of stars is maximised by maximising the second moment while keeping the 
hrst moment hxed, resulting in a small number of vertices having very large degrees while all the 
others have low degrees. To see this, we consider the following iterative process. We pick two 
vertices i and j and assume that ki > kj. If j has a neighbour which is not already connected to 
i then this edge is rewired to increase fcj by 1 and decrease kj by 1. This step is repeated until 
in every pair, the vertex with the lesser degree has no more ’spare’ edges to rewire to the other 
vertex. In the case where L < A^ — 1 this always ends with a graph where there is one vertex 
with degree L, L vertices connected to it with degree 1, and any left over vertices having degree 
0 (because any other conhguration would contain ’spare’ edges). If we increase L just beyond 
N, we can no longer increase the stars by rewiring to the original hub as that hub is already 
connected to every other vertex. This causes a new hub to form to take up the extra edges. This 
process of hubs being born and increasing in size until they have degree N keeps going as L is 
increased until the graph is complete. 

In this work we consider sparse graphs, where the average connectivity is 0{1). For this case 
the total number of edges L is 0{N) which means that the minimum value that the number of 
stars can take is 0{N). On the other hand, the maximum star conhguration has an 0(1) number 
of vertices with degree N, and an 0{N) number of vertices with degree 0(1). The vertices with 
degree N give each a contribution to the number of stars N{N — l)/2 showing that the maximum 
density of stars in a hnitely connected graph is 




N 


0{N). 


(25) 
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Expansions abont mean-field solntions (22), (23) show that both moments (fc), (k^) diverge at the 


same parameter valnes, snggesting that it is impossible to achieve the maximnm expected star 
density for any finite value of connectivity. However, higher order (non-Ganssian) finctnations 


may become dominant in the finite connectivity regime and the expansions (22) and (23) may 


get very inaccnrate. We will test their accnracy against MCMC simnlations and results of the 
exact calculation in the next section. 

3. Exact solution in the finite connectivity regime 

In this section, we derive an exact solution for the two-star model in the finite connectivity 
regime, where mean-field solutions are expected to become inexact. First off, we rewrite the 
graph Hamiltonian by performing the sum over A; in (|^ and using symmetry of c^- = Cji 

H{c) = -2a'^Cij - + kj{c)). (26) 

i<j i<j 

Hence, we introduce the partition function 

c 


k c 




pJ2i<i Cij{2a+l3{ki+kj)-i{n,+nj)) 


e^i<j 




2a-\-l3{ki-\-kj)—i{Qi+Qj)'\ 


i<j 


(27) 


where (5k ,k(c) Hi dki,ki(c) with 5x^y being the Kronecher delta, taking value 1 for x = y and 0 

otherwise, and the Fourier representation of the Kronecher delta has been used 

dTl 


dx,y 


JQ{x-y) 


/-TT 27r 


(28) 


Next, we focus on the normalised logarithm of (27), giving the free energy density / = —N ^ log Z, 


which can be calculated exactly, for large N, in the finite connectivity regime (cjj) = 0{N~^) V i,j, 
by using path integrals. 

As a first step, we consider the likelihood (cij) for two nodes i,j to be connected. This follows 
from the estimate of the average number of links L = (T(c)) = Yi,i<j{cij)- Using 


P(c) = 7 


'-n (27r) 




N 


we have 


L = 


JJ" ^^2a+l3{ki+kj)-i{ni+nj)^ 
i<j 




( 29 ) 


i<j 

X 


i<j k 


—7r (27r)"^ 1 


k<£ 
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_j_ ^2a+l3{ki+kj)—i{fii+Clj) ^k,r2 


i<j 


where 




'-n (27r) 




N 


■n[ 

k<i 


1 + e' 


2a+/3(/cfc+fc^)—z(r2fc+Q£) 


Hence, one has 


(cij) (: 


3 2a+/3(fc^+fcj)—) 


t ^k,i2 


(30) 


(31) 


(32) 


In the regime {cij) = 0{N~^), it is convenient to transform a —>■ a — | log(A^/c), with c = 0{N^), 
to make (32) explicitely 0{1/N), so we get 


(c-■) ~ ^ ^Q‘i&+h(ki+k-j)—i(Qi+Qi)\ 


k,f2 


and 


^=E 


'-n (27r) 


if2k 

e exp 


N 


2&+i3{kk+ki)-i{nk+ni) 


k<l 


(33) 


(34) 


It is also useful to express the network distribution p(c) in the scaled parameter a 

Z 

-r Ocijfl 




i<j 


1 

z 




Kj 


Next we introduce the following order parameters 


N 


p{k, fi|k, n) = -J2 dk,kA^ - A) 

r=l 


(35) 


(36) 


and insert into (34) for each (fc,f2) the following integral: 

1= JdP{kA) d[P{kA)-PikA\K^) 




(37) 


Discretizing D in steps of size A which is eventually sent to zero, we can then write the free-energy 
as the following path integral, with the short-hand {dPdP} = nfc,o[dP(A:, D)dP(fc, D)/27r] and 
sums over D transformed into integrals: 

/ = - lim ^ log Z 

N^OO 7V 

N—>-oo N J 

A Sfc fc'>0 A -P(*lP)'P(fc'P')e2“+/3(fc+'=')-l(f2+f2') 


X e 

= — lim — 

TV—400 N 


log f {dPdP}e-^‘^''’r 


(38) 
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where 


$[p,p]= -iY^ r dnp{k,n)P{k,n)-- Y f dndn' 


- log ^ 


k>0' 


For large N, we can evaluate the path integral in (38) by steepest descent 
/ = niinpp$[P, P] 

Extremizing the action <F over P, P leads to the saddle-point equations 

^iQk—iP(k,Q,) 


p{k,n) = 




-iP{k, n) = ce^+hk-in j ^ ^a+pk 


dnP{k,n)e 


—iO. 


Kk>{) 


The equation above suggests to dehne 
P{k)= r dQP{k,Q)e-^^ 


and 


so that 


Hence, 


and 




k>0 


- iP{k, n) = ce“+^^-*^7(d, /3) 


P{k,n) = 


3iOfc+C7e“+^'=e-'“ 


Efc>0 I-n 


P{k) = 


^ gQ-|-/3/c^/c 1 

(fc-1)! 


E 

k>0 


^c^^a+pky 

k\ 


n -1 


eik-- 


(39) 


(40) 


(41) 

(42) 

(43) 

(44) 

(45) 

(46) 


where 6 {x) is the Heaviside step function, taking value 1 for a; > 0 and 0 for x < 0. Note that 
P{k) is not a distribution because it is not normalised to one, instead we have J2k>o P{k) = 
with 

Efc>o-(c7e“+^")Vfc! 


EA:>o(c7e“+^'')Vfc! 

The resulting free energy density is 


n&,P) = 


2 ^ P 

^ k>o 

where 7 solves the self-consistency equation 

c-Pi&X) = {k}^ 


(47) 


(48) 


(49) 
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Finally, the ensemble parameters a,(d, have to be determined from the equations for the 
constraints which are found by taking the derivatives of / with respect to a, f3 as 


and 




(50) 


(51) 


where the last equality in (50) and (51) follows from the saddle-point equation (49). Combining 
(50) and (49) we have 7 = \l{k)/c, yielding the below equations for the ensemble parameters: 


{k) = 

{e) = 


k>0 ‘ 


E_ 

Et>„(y^{he‘+»)V*;! 
Et>oi"(Vt(he"+“)V*;! 


E»o(dF)e"'’*)V*:! 

Hence, we can hnally write the free-energy density as 
/({fc>, - log x: e“*+»7(l;) 


fc >0 


where a and (3 solve 


(k) = 

{P) = 


Ea:>o kg{k)e^’^+^’^" 




with 


»(*:) = 


E»>oS(l:)e“‘+*’ 

y^c(fc))i'e-0i)+0/i 

k\ ' 


(52) 

(53) 

(54) 

(55) 

(56) 

(57) 


In conclusion, we have that for the finitely-connected two-star model, with constrained average 
connectivity (k) and degree variance {k‘^), the network distribution is 


p{cm, (fci)) = 1 n ^ 

^ i<j ^ 


(58) 


where a and /3 are determined from (55) and 


We note that for the choice c = (k), g{k) 


becomes a Poissonian distribution with parameter {k). 

3.1. Test for (3 = 0 

First, we check the validity of our equations for the case (3 = 0, where we should get back to the 


ER graphs. For (3 = 0 the equation for the constraint (55) gives 
{k) = ce 2 “ 

Substituting in ( [58| ) we get 

^ Kj L 


(59) 


( 60 ) 
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with 


yielding, for large N, 


^=En 

C i<j L 


{k)r r 

-Or- -1 “T 0^.. n 


p{c) = n 

i<j L 

thus recovering the Erdos-Renyi ensemble 




n 

Kj 


1 + 


(fc) 

N 


(61) 


3.2. Numerical results 


For /3 7 ^ 0, we need to solve equations (55), (56) numerically. We note, however, that for positive 


values of (3, the sums on the RHS of these equtions do not converge, hence only values (3 < 0 are 
admissible. The parameter c can be chosen arbitrarily, so we will set it to unit, without loss of 
generality. 

In Figure we compare theoretical results from (55), (56) (orange symbols) with MCMC 
simulations for networks of = 3000 nodes (blue symbols) and predictions from mean-field 
theory (22) and (23) (green symbols). Plots show the logarithm of the link and of the star 


densities normalised with the logarithm of the system size, as functions of (3, for hxed values of 
a = —0.5,4, corresponding to low and high connectivity respectively. MCMC simulations show a 


divergence in the links and stars densities for f3 > 0, consistently with the fact that equations (55), 


(56) do not converge in this regime, and show excellent agreement with theoretical predictions 


at /3 < 0. In particular, simulations data are on top of theoretical ones at low connectivity (top 
panel) and deviations stay within hnite size effects at high connectivities (bottom panel). In 
contrast, mean-field predictions are seen to perform well at high connectivity but, as expected, 
get very inaccurate for small connectivity. In Figure we show three dimensional plots of the 
average connectivity and average density of stars, as functions of a and (3. One has that for 
hxed values of a, both connectivity and star density are at their highest for f3 = 0 and they 
decay quickly for f3 < 0. Notably, the star density is always hnite while the connectivity is hnite, 
hence the model fails to produce an arbitrary number of stars for any given hnite connectivity. 
In particular, we hnd that the star density is always close to its physical minimum. This is better 
understood by looking at contour plots of constant average connectivity and constant average 
star density in Figure These show that for (3 hxed the connectivity increases with a quicker 
than the star density, so that the star density decreases as we move along the contours of constant 
connectivity in the direction of decreasing (3. Hence, the maximum number of stars is obtained 
for (3 = 0, which corresponds to Erdos-Renyi graphs, satisfying (k^) — (k) = (k)^, so that the 
contour of constant connectivity and star density coincide. The reason for this behavior can be 


understood by looking at equation (46), showing that for (3 < 0, P{k) is Poissonian [(3 = 0) or 


narrower (/d < 0), whereas for (3 > 0, P{k) is not normalizable, with an asymptotic behavior for 
laige-k given by P{k) ~ e^ihk-iogk)^ independent of a. 


This shows that for {k"^) > -f- {k), equations (55) and (56) have no solution. We will 

see in the next section that in this range of the imposed constraints, the partition function can 
no longer be calculated by the saddle-point method illustrated above and a different approach 
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Figure 1. Plot of log (A:)/log (left panels) and log (A:^) — (A:)/logA^ (right panels) as functions 
of the ensemble parameter /3, for a = —0.5 (top panels) and a = 4 (bottom panels) and c = 1. 
MCMC denote Monte Carlo simulations for iV = 3000, FC denotes exact results from formulae 
(55), (56) and PN denotes predictions from expansions about mean-field theory (22), (23). 



Figure 2. Plot of the average connectivity (left panel) and the average density of stars (right 
panel) as functions of the ensemble parameters a, /3. 



Figure 3. Contour of constant average connectivity and average stars in the space of ensemble 
parameters a, (3. 

































The two-star model: exact solution in the sparse regime and condensation transition 


13 


is needed. This leads to a phase diagram in the (k) — (fc^) plane with two phases which display 
different behaviors of the partition function Z in the large A^-limit. The critical line separating 
the two phases is found by solving (55) and (56) for (3 = 0, which gives (k^) = + (k). For 

{k^) < {kf + (fc) 

Z{N{k),N{k'^)) ~ (62) 


where f{{k),{k‘^)) is given by (54), whereas for + (k) the asymptotic behavior of 

Z is given by ( [6^ . We will show in the next section that the phase transition occurring at 
{k"^) = (k)^ + (k) is a condensation from a liquid to a condensed phase, due to a large deviation 
of sums of random variables. We note that condensation may be avoided in the scaling regime 
(3 = 0{logN/N), the same where Strauss model is known to display non-trivial behavior [TT] . 


4. Condensation as large deviation of snms of random degrees 

In this section we show that the condensation transition occurring at = {k)'^ + {k) is related to 
a large deviation of sums of random degrees and we provide the asymptotics of Z in the condensate 
phase (fc^) > (/c)^ + {k). To this purpose, it is convenient to use an alternative but equivalent 
definition of the two-star model, where the constraints on the links and stars are implemented 
directly in the network distribution 


p{cm,{e)) = 


-^-TT 

Z{N{k),N{k^))ll 


c 




i(i: fe(c) - N{kmi: k1(c) - N{k^)) 

i i 

(63) 


via Khronecher deltas and links are drawn otherwise randomly and independently with likelihood 
c/N. By using path integrals and the saddle point method used in Section we hnd that in 
the fluid phase (/c^) < (fc)^ -|- (k) the partition function Z(N(k), N{k"^)) has the large deviation 


behavior (62) with rate function given by the free-energy density (see Appendix B for full details) 
f{{k), (k^)) = a{k)+(3{e) - log5:e“'=+^^^(?(fc) (64) 


where 0,(3 are determined from equations (55), (56) and g{k) is defined in (57). Up to additive 


constants, the free-energy density (64) is identical to (54), hence the definition (63) of the two-star 


model is thermodynamically equivalent to the standard definition 


The marginal distribution 


p{k) in the fluid phase is given by (see Appendix C for derivations) 

^ak-\-l3k^ 

showing that all random degrees contribute to the sums k'f with small values. 


(65) 


Notably, the free-energy (64) and the marginal distribution (65) are identical to those of 
the system of independently and identically distributed random variables k = {ki, ..., k^q} with 
’’bare” distribution g{k) = g{k)/ J2k 9{k) and global constraints on the average and the variance, 
introduced in 123 

p(ki{A^), {k‘^)) = J - N{kmY.i=i - m 

Z{N{k),N{k^))\^ Y Y 
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The connection between the two models is transparent: a graph drawn from (63) will have a degree 
seqnence k where the degrees are random variables drawn indipendently from the Poissonian 
distribution g{k) with average \Jc{k), subject to the two global constraints. For the choice 
c = {k) each degree will be a Poissonian variable with average (k), subject to constraints. 

Systems with factorised steady states ( 66 ) are known to display condensation for non-heavy- 
tailed distributions g{k) ~ with 1 < 7 < 2 , when conditioned on large deviations of their 

linear statistics, i.e. for {k"^) larger than a critical value a{{k)), which is found by solving (55) 
and ([^ for /d = 0 |25], i.e. from 


7/,,, j:kk^9{k)e‘^^ 

(67) 

with a solving 


Ekkg{k)e^^ 

Y.kg{k)e-^' 

( 68 ) 


For g{k) dehned in (57), the critical value cr{{k)) can be calculated exactly: equation ( 68 ) gives 
a = In y {k)/c, and substituting in (67) we obtain ^{{k)) = (k)'^ (k). 

For {k^) > a{{k)), the method developed in [25] yields for function (57) with asymptotics 
gW ~ e the following asymptotic behaviour for the partition function 

^-NI({k))-[N{k'^)-Na{{k))]h log[N{k^)-N<7{{k))]h 


Z{N{k),N{k^)) ~ e" 


(69) 


with 


I{{k)) = a{k) 




Oik 


k>0 


and a determined from ( 68 ). In this regime, the marginal p{k) shows a bump at k 
meaning that there is a condensate of size ~ residing on a single site. Sampling networks 
from the two-star distribution (63) will then lead to graphs which have a bulk of homogeneous 
degrees and one single site accounting for all the degree hetoregeneity in the network, making this 
model unsuitable as a null model for hnitely connected random graphs with constrained average 
degree and variance. In addition, although Markov procsses generating random variables k under 
the two global constraints can be constructed as in [25|, the dehnition of algorithms to sample 


networks c from the measure (63) with the two hard constraints on degree average and variance, 
in an efficient and unbiased way, poses a challenge in its own right. We note that removing the 
hard constraint on the variance we obtain the system with factorised states studied in [18] 

1 


p(k|W) = 




(70) 


Z(N{k}) 

which is known to exhibit condensation only for heavy-tailed bare distributions g{k) ~ Ak~'^, 
with 7 > 2, for {k) < J2k>o kg{k). For these models, the ’’dressed” marginal distribution is found 
to be [T 8 ] 

g{k)e°'^ 


p(k) = 


(71) 


g{k)e°'^ 

with a solving (k) = J2k kp{k). This suggests that condensation transitions in the two-star model 
may be avoided by removing the hard constraint on the variance and choosing the bare distribution 






















The two-star model: exact solution in the sparse regime and condensation transition 


15 


g{k) in such a way that the ’’dressed” marginal displays the desired variance (A;^) = 
while satisfying Y.k>okg{k) < (k). 

5. Conclusion 

In this work we analysed the hnitely connected 2-star model. This model has been solved 
analytically within mean-held approximations, which predict a second-order phase transition 
between a symmetric and a symmetry-broken phase and a hrst-order transition in the link density 
occuring along the critical line separating the symmetry-broken phases, in the ensemble parameter 
space. In this work we solved the 2-star model exactly, in the thermodynamic limit, in the hnite 
connectivity regime, where mean-held approximations are shown to become inexact. Our results 
show that in the thermodynamic limit the system undergoes a condensation transition, from 
a liquid to a condensed phase, related to large deviations in the sum of the random degrees, 
induced by the global constraints on their linear statistics. We showed that in the liquid phase, 
the degree statistics exhibited by the model is always in between regular and Erdos-Renyi graphs. 
Our results are in excellent agreement with MCMC simulations and are compared with existing 
results from mean-held theory and expansions around it. The latter become inaccurate for small 
connectivity, albeit providing the correct location of the critical line. When the global constraints 
imposed on the linear statistics insist on a degree statistics more heterogeneous than Erdos-Renyi 
graphs, the model undergoes a condensation transition, whereby a condensate residing on a single 
site appears in the system, which accounts for all the degree heterogeneity of the network, while 
the bulk of the network have homogeneous degrees. 

We conclude that the hnitely connected 2-star model is unsuitable, in its standard dehnition, 
as a null-model for real networks, with prescribed connectivity and link density. We suggest 
possible modihcations of the model which may lead to non-trivial degree statistics in the 
hnite connectivity regime, which either entail a diherent scaling of the Lagrange multiplier (3 
constraining the stars or the use of a soft constraint for the star density, while the link density is 
hardly constrained. 
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Appendix A. Mean-field predictions in the finitely connected regime 


In the low connectivity regime we demand that p = c/N with c = 0{N^). To ensure that, we see 
from (19) that we have to set f3 = 0{1) and scale the ensemble parameter a as a —>■ d — 1/2 log N 
so that the RHS of (19) becomes 


1 

2 


tanh {/3c + a 


^logA) + 1 


1 

2 


'J_g2,Sc+2a _ ^ 

A_L 1 

J_p2l3c+2& I 1 ^ 


J_p2/3c-h2a 

_ 

1 g2/3c+2a ^ X 
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^2/3c+2q 


N 


(A.l) 


where the last equality holds for large N and yields p = 0{N as required. Equating this to 


the LHS of (19) we get the self-consistency equation 


c = 


(A.2) 


A graphical analysis of (A.2) reveals that for (d < 0 there is only one solution, whereas for 
(d > 0 there can be one, two or no solutions depending on the choice of a and fd. To identify 


the regions of the parameters for which solutions exist, note that the RHS of (A.2) is convex and 
strictly increasing in c. Let c* be the value of c for which ^RHS = = 1. If the value of 

the RHS at c* is greater than c* then there can be no solutions of the self-consistency equation, 
while if it is less than c* there are two, and there is only one solution if the RHS equals c*. We 
hnd that c* is given by 

2^g2/3c*+2a ^ ^ 

We have that there exists only one solution when c* = which using the dehning equation 


of c* (A.3) simplihes to c* = l/(2/3). Inserting in (A.3) we hnd that there is a unique solution for 
fd = We have no solutions when c* < = l/{f2fd) which inserted in (A.3) gives 

1 < 2fde^^‘^°‘ hence fd > Finally we have two solutions for c* > that is in the 


,-l-2o 


/2 the only solutions of (|A.2|) is c = 

As 


ran e 


region fd < This implies that aX fd = e 

whereas for fd < e“^“^“/2 there are two solutions, one smaller and one greater t 
fd approaches zero the greater solution tends to inhnity while the lower solution tends to e^". 
From (17) we have = 1/(1 —p) which substituted in (14) gives for the free-energy density 


/ = F/N = ln(l — p){N — l)/2 ~ —cl2^ for p = c/N and N ^ c, showing that the free-energy 
decreases as the connectivity increases. Hence, the stable solution for 0 < /? < will be 

the one with higher connectivity. In conclusion, the mean-held theory in the hnite connectivity 
regime predicts a critical line at fd = 0 where the average connectivity jumps from 0{1) to 0{N) 
values. Although the mean-held approximation is invalid in the hnite connectivity regime, it picks 
up the correct location of the critical line /? = 0 found from the exact analysis. 


Appendix B. Calculation of the partition function in the liquid phase 


In this section we calculate the normalising constant Z{N{k), N{k‘^)) of the distribution (63). 
First oh, we use the identity 1 = I]k<^k,k(c) to write 


Z(c|Af(«:>,Af(«:=)) = X:i:4,k(c)n 

i<j 


k c 


C ( C 




and the Fourier representation of the Kronecher deltas, giving 


Z(c|A(fc),A(fc 2 )) = 


dudoj' iij]S[(k)+iuj'N(k'^) 


E 


J-T, (27r) 

11 pr ^ I pr 

C i<j \ iv 


(B.l) 

(B.2) 


N 
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We next introduce the following order parameters 


1 


N 




(B.3) 


r=l 


and insert into (B.2) for each the following integral: 
1 = I dPm 6\P{Q)-P{Q\n) 


= {N/2n) (B.4) 

Discretizing D in steps of size A which is eventually sent to zero, and proceeding as in Sec. we 
can then write the free-energy density as the path integral 


/ = - lim — log 

N^OO N 

dPdP}e 


r'p dujdu' 


^iujN (k)-\-iLo' N (k‘^) 


-TT 47r^ 

iN f dflP(fl)P(n)+^ f dfldfi'P(fl)P(n')e-‘(‘^+^'^-l 


xRE 

i ki 


dfli 

-< 

27r 


^iQiki—iujki—iLj'k'^—iP{Q,) 


= — lim — log 

N^oo N 




(B.6) 


where 


$(P,P,a;,a;') = - iu(k) - iu'{k'^) - i f dDP(D)P(D) - - / dDdD'P(D)P(D')e-*(^+^') 

J—TT 2 J 


- log ^ + 


fc>0‘ 


For large N, we can evaluate the integral by steepest descent 
/ = minpp_^,^,$(P,P,n;,cn') 

Extremizing the action <F over P, P we obtain, proceding as in Section 


with 


= (k)-, 

_ Efc>o-(c7)'=e--"--'"VA;! 
\ /7 “ 


Extremizing <F over u,u' we obtain 

{k) = (fc)„ i^) = {e). 

Substituting the saddle point equations in the free energy we have 


/ = - iu:{k) - iiw'ip) - log ^ f 


dfle 


—iuk—iuj'k’^ 


a(k) 


(B.6) 

(B.7) 

(B.8) 

(B.9) 

(B.IO) 

(B.ll) 


with 


9{k) = 


c{k)Ye 


k\ 


(B.12) 
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and u, bj' determined from 
Ea:>o 


(fc) = 




Setting —ioj = a, —ioj' = (5 yields (64). 


(B.13) 

(B.14) 


Appendix C. Calculation of the marginal distribution in the liquid phase 

In this section we calculate the marginal distribution 

P{k\c) = 

i 

In the limit of large A, we expect this quantity to converge to its ensemble average 

p{k) = {p{k\c)) = f ^ 6 “"= r 

J .ZTT J —TT 47r 

—ztd . fcj—. fc? 

-P P, L^x * L^x i 




X E n + (1 - 1?) K,.' 

c k<e 


c 

N 


[d^ixk r ia;V(fc>WV(fc2> 

J 271 J-n 47r2 


y^ . ki—iu)' y^ . fc? 


(C.l) 


^ r dfl 

X 6Xp i ^ ^ ^ 

u<^ ^ 

(C.2) 

Next we work out the curly brackets as 

^ \^^-i{n^+n,)-ix{5a+5,u) - l] = ^ X - l){5u + dik) + 1] - l} 

k<e ^ k<l ^ 

(C.3) 


X 


dx ixk . 

dudu' 

2n J 

-El 

t k 

dx i,k 

-TT 47r2 

^ f dO, 

-J (27r)^ 
dudu' 

277 J 

^ E^ 

N ^ 

-TT 47r2 

Iki I dVtiC- 


Inserting the order parameter (B.3) via path integrals as done in Appendix B we arrive at 

,io.N{k)+iu.'N{k^) [ tdPdP]e^ ^ [/d0d0'P(0)P(0')e- 

d^ ^iQ,\L^-iu}P, ki-iLo'P, k'j-iP, P{ai)+ce-^^i{e-*^-l) f dQ.P{Q.)e-*^ 

iN f dQP(n)P(Q)+:f^ f dfldQ' P(Q)P(Q')e- 


,iuN {k)+iLo' N {k^} 
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j kj 

dudu' 


dx 


ixk 


l-TT 47r2 


-J {dPdP}t 


-N:P{[P,P],uj,uj') 


X 


J ^^^iQk—iuk—iuj'k^—iP(Q)-\-ce ’'^(e —1) J" d£lP(Q)e 


where ^{P, P,u,u') is as in (B.6). We next calculate the integrals over u,u',P,P by steepest 

(C.5) 


descent. At the saddle point we have 

- iP{Q) = 

and we obtain 


p{k) = 


dx Eg / 


E I c{k)e-'' 


where solve (B.13), (B.14). Performing the f2-integrals we have 


rdx 

= i 




(c7)Vg! 

and carrying out the integration over x hnally gives 

--^^k-iu,'k^(^^^Y/k\ 


p{k) = 


yy ^—iujq—iu)'q^ 


= 9{kh 


c{k))’i/q\ 

—iuk—iu'k'^ 


with g{k) dehned in 


Setting —ito = a, —ioo' = /? hnally yields (65). 


(C.6) 


(C.7) 


(C.8) 


(C.4) 
















